Music identification via audio fingerprinting has been an active research field in recent years. In the real-world\nenvironment, music queries are often deformed by various interferences which typically include signal distortions and\ntime-frequency misalignments caused by time stretching, pitch shifting, etc. Therefore, robustness plays a crucial role\nin music identification technique. In this paper, we propose to use scale invariant feature transform (SIFT) local\ndescriptors computed from a spectrogram image as sub-fingerprints for music identification. Experiments show that\nthese sub-fingerprints exhibit strong robustness against serious time stretching and pitch shifting simultaneously. In\naddition, a locality sensitive hashing (LSH)-based nearest sub-fingerprint retrieval method and a matching\ndetermination mechanism are applied for robust sub-fingerprint matching, which makes the identification efficient\nand precise. Finally, as an auxiliary function, we demonstrate that by comparing the time-frequency locations of\ncorresponding SIFT keypoints, the factor of time stretching and pitch shifting that music queries might have\nexperienced can be accurately estimated.
Loading....